70 research outputs found

    Fragrep: An Efficient Search Tool for Fragmented Patterns in Genomic Sequences

    Get PDF
    Many classes of non-coding RNAs (ncRNAs; including Y RNAs, vault RNAs, RNase P RNAs, and MRP RNAs, as well as a novel class recently discovered in Dictyostelium discoideum) can be characterized by a pattern of short but well-conserved sequence elements that are separated by poorly conserved regions of sometimes highly variable lengths. Local alignment algorithms such as BLAST are therefore ill-suited for the discovery of new homologs of such ncRNAs in genomic sequences. The Fragrep tool instead implements an efficient algorithm for detecting the pattern fragments that occur in a given order. For each pattern fragment, the mismatch tolerance and bounds on the length of the intervening sequences can be specified separately. Furthermore, matches can be ranked by a statistically well-motivated scoring scheme

    Comparative Analysis of Cyclic Sequences: Viroids and other Small Circular RNA`s

    Get PDF
    The analysis of small circular sequences requires specialized tools. While the differences between linear and circular sequences can be neglected in the case of long molecules such as bacterial genomes since in practice all analysis is performed in sequence windows, this is not true for viroids and related sequences which are usually only a few hundred basepairs long. In this contribution we present basic algorithms and corresponding software for circular RNAs. In particular, we discuss the problem of pairwise and multiple cyclic sequence alignments with affine gap costs, and an extension of a recent approach to circular RNA folding to the computation of consensus structures

    U7 snRNAs

    Get PDF
    U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database

    Evolution of the vertebrate Y RNA cluster

    Get PDF
    Relatively little is known about the evolutionary histories of most classes of non-protein coding RNAs. Here we consider Y RNAs, a relatively rarely studied group of related pol-III transcripts. A single cluster of functional genes is preserved throughout tetrapod evolution, which however exhibits clade-specific tandem duplications, gene-losses, and rearrangements

    Shape Decomposition Algorithms for Laser Capture Microdissection

    Get PDF
    In the context of biomarker discovery and molecular characterization of diseases, laser capture microdissection is a highly effective approach to extract disease-specific regions from complex, heterogeneous tissue samples. These regions have to be decomposed into feasible fragments as they have to satisfy certain constraints in size and morphology for the extraction to be successful. We model this problem of constrained shape decomposition as the computation of optimal feasible decompositions of simple polygons. We use a skeleton-based approach and present an algorithmic framework that allows the implementation of various feasibility criteria as well as optimization goals. Motivated by our application, we consider different constraints and examine the resulting fragmentations. Furthermore, we apply our method to lung tissue samples and show its advantages in comparison to a heuristic decomposition approach

    Tanimoto's best barbecue: discovering regulatory modules using tanimoto scores

    Get PDF
    We present a combinatorial method for discovering cis-regulatory modules in promoter sequences. Our approach combines “sliding window” approaches with a scoring function based on the so-called Tanimoto score. This allows to identify sets of binding sites that tend to occur preferentially in the vicinity of each other in a given set of promoter sequences belonging to co-expressed or orthologous genes. We benchmark our method on a data set derived from muscle-specific genes, demonstrating that our approach is capable of identifying modules that were identified as functional in previous studies

    Grayscale representation of infrared microscopy images by Extended Multiplicative Signal Correction for registration with histological images

    Get PDF
    Fourier-transform infrared (FTIR) microspectroscopy is rounding the corner to become a label-free routine method for cancer diagnosis. In order to build infrared-spectral based classifiers, infrared images need to be registered with Hematoxylin and Eosin (H&E) stained histological images. While FTIR images have a deep spectral domain with thousands of channels carrying chemical and scatter information, the H&E images have only three color channels for each pixel and carry mainly morphological information. Therefore, image representations of infrared images are needed that match the morphological information in H&E images. In this paper, we propose a novel approach for representation of FTIR images based on extended multiplicative signal correction highlighting morphological features that showed to correlate well with morphological information in H&E images. Based on the obtained representations, we developed a strategy for global-to-local image registration for FTIR images and H&E stained histological images of parallel tissue sections.publishedVersio

    Characterization of statistical features for plant microRNA prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several tools are available to identify miRNAs from deep-sequencing data, however, only a few of them, like miRDeep, can identify novel miRNAs and are also available as a standalone application. Given the difference between plant and animal miRNAs, particularly in terms of distribution of hairpin length and the nature of complementarity with its duplex partner (or miRNA star), the underlying (statistical) features of miRDeep and other tools, using similar features, are likely to get affected.</p> <p>Results</p> <p>The potential effects on features, such as minimum free energy, stability of secondary structures, excision length, etc., were examined, and the parameters of those displaying sizable changes were estimated for plant specific miRNAs. We found most of these features acquired a new set of values or distributions for plant specific miRNAs. While the length of conserved positions (nucleus) in mature miRNAs were relatively longer in plants, the difference in distribution of minimum free energy, between real and background hairpins, was marginal. However, the choice of source (species) of background sequences was found to affect both the minimum free energy and miRNA hairpin stability. The new parameters were tested on an Illumina dataset from maize seedlings, and the results were compared with those obtained using default parameters. The newly parameterized model was found to have much improved specificity and sensitivity over its default counterpart.</p> <p>Conclusions</p> <p>In summary, the present study reports behavior of few general and tool-specific statistical features for improving the prediction accuracy of plant miRNAs from deep-sequencing data.</p
    • …
    corecore